Distilling Model Knowledge

نویسنده

  • George Papamakarios
چکیده

Top-performing machine learning systems, such as deep neural networks, large ensembles and complex probabilistic graphical models, can be expensive to store, slow to evaluate and hard to integrate into larger systems. Ideally, we would like to replace such cumbersome models with simpler models that perform equally well. In this thesis, we study knowledge distillation, the idea of extracting the knowledge contained in a complex model and injecting it into a more convenient model. We present a general framework for knowledge distillation, whereby a convenient model of our choosing learns how to mimic a complex model, by observing the latter’s behaviour and being penalized whenever it fails to reproduce it. We develop our framework within the context of three distinct machine learning applications: (a) model compression, where we compress large discriminative models, such as ensembles of neural networks, into models of much smaller size; (b) compact predictive distributions for Bayesian inference, where we distil large bags of MCMC samples into compact predictive distributions in closed form; (c) intractable generative models, where we distil unnormalizable models such as RBMs into tractable models such as NADEs. We contribute to the state of the art with novel techniques and ideas. In model compression, we describe and implement derivative matching, which allows for better distillation when data is scarce. In compact predictive distributions, we introduce online distillation, which allows for significant savings in memory. Finally, in intractable generative models, we show how to use distilled models to robustly estimate intractable quantities of the original model, such as its intractable partition function.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Distilling the Knowledge in a Neural Network

A very simple way to improve the performance of almost any machine learning algorithm is to train many different models on the same data and then to average their predictions [3]. Unfortunately, making predictions using a whole ensemble of models is cumbersome and may be too computationally expensive to allow deployment to a large number of users, especially if the individual models are large n...

متن کامل

Distilling Knowledge Using Parallel Data for Far-field Speech Recognition

In order to improve the performance for far-field speech recognition, this paper proposes to distill knowledge from the close-talking model to the far-field model using parallel data. The close-talking model is called the teacher model. The farfield model is called the student model. The student model is trained to imitate the output distributions of the teacher model. This constraint can be re...

متن کامل

Distilling Knowledge from an Ensemble of Models for Punctuation Prediction

This paper proposes an approach to distill knowledge from an ensemble of models to a single deep neural network (DNN) student model for punctuation prediction. This approach makes the DNN student model mimic the behavior of the ensemble. The ensemble consists of three single models. Kullback-Leibler (KL) divergence is used to minimize the difference between the output distribution of the DNN st...

متن کامل

Distilling Intractable Generative Models

A generative model’s partition function is typically expressed as an intractable multi-dimensional integral, whose approximation presents a challenge to numerical and Monte Carlo integration. In this work, we propose a new estimation method for intractable partition functions, based on distilling an intractable generative model into a tractable approximation thereof, and using the latter for pr...

متن کامل

How can Diagrammatic Conceptual modelling Support Knowledge Management?

Traditionally, venues that are publishing Knowledge Management research have been separating concerns between two viewpoints that rarely converge into holistic approaches: one is the tradition of Artificial Intelligence research, where "Knowledge Management" is often employed as an umbrella term in relation to a variety of semantic technologies, knowledge representation and knowledge discovery ...

متن کامل

TandemNet: Distilling Knowledge from Medical Images Using Diagnostic Reports as Optional Semantic References

In this paper, we introduce the semantic knowledge of medical images from their diagnostic reports to provide an inspirational network training and an interpretable prediction mechanism with our proposed novel multimodal neural network, namely TandemNet. Inside TandemNet, a language model is used to represent report text, which cooperates with the image model in a tandem scheme. We propose a no...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1510.02437  شماره 

صفحات  -

تاریخ انتشار 2015